Learning Multi-Modal Word Representation Grounded in Visual Context

نویسندگان

  • Eloi Zablocki
  • Benjamin Piwowarski
  • Laure Soulier
  • Patrick Gallinari
چکیده

Representing the semantics of words is a long-standing problem for the natural language processing community. Most methods compute word semantics given their textual context in large corpora. More recently, researchers attempted to integrate perceptual and visual features. Most of these works consider the visual appearance of objects to enhance word representations but they ignore the visual environment and context in which objects appear. We propose to unify textbased techniques with vision-based techniques by simultaneously leveraging textual and visual context to learn multimodal word embeddings. We explore various choices for what can serve as a visual context and present an end-to-end method to integrate visual context elements in a multimodal skip-gram model. We provide experiments and extensive analysis of the obtained results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deriving continous grounded meaning representations from referentially structured multimodal contexts

Corpora of referring expressions paired with their visual referents are a good source for learning word meanings directly grounded in visual representations. Here, we explore additional ways of extracting from them word representations linked to multi-modal context: through expressions that refer to the same object, and through expressions that refer to different objects in the same scene. We s...

متن کامل

Deep embodiment: grounding semantics in perceptual modalities

Multi-modal distributional semantic models address the fact that text-based semantic models, which represent word meanings as a distribution over other words, suffer from the grounding problem. This thesis advances the field of multi-modal semantics in two directions. First, it shows that transferred convolutional neural network representations outperform the traditional bag of visual words met...

متن کامل

Improving Word Representations via Global Visual Context

Visually grounded semantics is a very important aspect in word representation, largely due to its potential to improve many NLP tasks such as information retrieval, text classification and analysis. We present a new distributed word learning framework which 1) learns word embeddings that better capture the visually grounded semantics by unifying local document context and global visual context,...

متن کامل

Learning how to Learn: An Adaptive Dialogue Agent for Incrementally Learning Visually Grounded Word Meanings

We present an optimised multi-modal dialogue agent for interactive learning of visually grounded word meanings from a human tutor, trained on real human-human tutoring data. Within a life-long interactive learning period, the agent, trained using Reinforcement Learning (RL), must be able to handle natural conversations with human users, and achieve good learning performance (i.e. accuracy) whil...

متن کامل

VOILA: An Optimised Dialogue System for Interactively Learning Visually-Grounded Word Meanings (Demonstration System)

We present VOILA: an optimised, multimodal dialogue agent for interactive learning of visually grounded word meanings from a human user. VOILA is: (1) able to learn new visual categories interactively from users from scratch; (2) trained on real human-human dialogues in the same domain, and so is able to conduct natural spontaneous dialogue; (3) optimised to find the most effective trade-off be...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1711.03483  شماره 

صفحات  -

تاریخ انتشار 2017